* **Naive RAG:** Uses simple vector similarity for direct, fact-based queries.
* **Multimodal RAG:** Retrieves information across various formats, including text, images, and audio.
* **HyDE (Hypothetical Document Embeddings):** Generates a "fake" answer first to improve the retrieval of real documents.
* **Corrective RAG:** Verifies retrieved data against trusted sources to ensure accuracy.
* **Graph RAG:** Utilizes knowledge graphs to capture complex relationships between entities.
* **Hybrid RAG:** Combines vector-based retrieval with graph-based methods for richer context.
* **Adaptive RAG:** Dynamically switches between simple retrieval and complex reasoning based on the query.
* **Agentic RAG:** Employs AI agents to manage complex workflows involving multiple tools and sources.
This article discusses how AI tools can be used to enhance the reading experience by providing instant access to information and background details, similar to using a dictionary or Wikipedia, but with the ability to ask more complex questions. The author shares personal examples of using AI while reading 'The Dark Forest' and other books to clarify plot points and gain a better understanding of the material.
This article compares the performance of LLM embeddings, TF-IDF, and Bag of Words for text vectorization and information retrieval tasks using scikit-learn. It provides a practical comparison with code examples and discusses the strengths and weaknesses of each approach.
This paper reports on an experiment to build a domain-aware Japanese text-embedding approach to improve the quality of search at Mercari, Japan's largest C2C marketplace.
This article explores the architecture enabling AI chatbots to perform web searches, covering retrieval-augmented generation (RAG), vector databases, and the challenges of integrating search with LLMs.
This paper addresses the misalignment between traditional IR evaluation metrics and the requirements of modern Retrieval-Augmented Generation (RAG) systems. It proposes a novel annotation schema and the UDCG metric to better evaluate retrieval quality for LLM consumers.
This article details the process of building a fast vector search system for a large legal dataset (Australian High Court decisions). It covers choosing embedding providers, performance benchmarks, using USearch and Isaacus embeddings, and the importance of API terms of service. It focuses on achieving speed and scalability while maintaining reasonable accuracy.
In this paper, we introduce PLUM, a framework designed to adapt pre-trained LLMs for industry-scale recommendation tasks. PLUM consists of item tokenization using Semantic IDs, continued pre-training (CPT) on domain-specific data, and task-specific fine-tuning for recommendation objectives. We conduct comprehensive experiments on large-scale internal video recommendation datasets and demonstrate substantial improvements for retrieval compared to a heavily-optimized production model.
A blog post comparing when to use regular Google search versus LLMs for research, outlining the strengths and weaknesses of each. It details scenarios where search engines excel (facts, current events, specific sources) and where LLMs shine (analysis, synthesis, creative thinking). It also lists tasks LLMs struggle with, such as complex reasoning, real-time information, and fact verification.
This blog post details an experiment testing the ability of LLMs (Gemini, ChatGPT, Perplexity) to accurately retrieve and summarize recent blog posts from a specific URL (searchresearch1.blogspot.com). The author found significant issues with hallucinations and inaccuracies, even in models claiming live web access, highlighting the unreliability of LLMs for even simple research tasks.